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ABSTRACT 

We have developed a method based on wavelets to obtain the true underlying smooth density from a 
point distribution. The goal has been to reconstruct the density field in an optimal way ensuring that 
the morphology of the reconstructed field reflects the true underlying morphology of the point field 
which, as the galaxy distribution, has a genuinely multiscale structure, with near-singular behavior on 
sheets, filaments and hotspots. If the discrete distributions are smoothed using Gaussian filters, the 
morphological properties tend to be closer to those expected for a Gaussian field. The use of wavelet 
denoising provide us with a unique and more accurate morphological description. 
Subject headings: methods: statistical; galaxies: clustering; large-scale structure of Universe 



1. INTRODUCTION 

The large-scale structure of the universe shows intri- 
cate patterns with filaments, clusters, and sheet-like ar- 
rangements of galaxies encompassing large nearly empty 
regions, the so-called voids. This complex structure 
shows clearly non-Gaussian features. However, it is likely 
that the observed structure developed from tiny fluctu- 
ations of an initial Gaussian random field by the action 
of gravity. This is the scenario suggested by the analy- 
sis of the maps of the microwave background radiation. 
Thus, it is important to check if the present large-scale 
structure is compatible with the Gaussianity of the initial 
fluctuations. 

Different statistical measures have been used in the 
cosmologica l literature to quantitati vely describe the cos- 
mic texture ijMart inez fc Saarl2002|) . To complement the 
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information provided by the second-order descriptors - 
the correlation function and the power spectrum- dif- 
ferent alternatives have been proposed. Some of these 
statistics are focused in quantifying geometrical and mor- 
phological aspects of the distribution. In this context, 
the gen us, introduced to m easure deviations from Gaus- 
sianity ijGott et alJ Il986j) , is one of the most widely 
used techniques. The genus and its generalization, the 
Minkowski functionals, allow us to quantify the morphol- 
ogy of the isodensity surfaces of the matter distribution. 

The Minkowski functionals describe the morphology of 
hypersurfaces, with dimensionality one less than that of 
the encompassing space. In the analysis of the three- 
dimensional matter distribution, the functionals are ap- 
plied to isodensity surfaces separating regions with den- 
sity above and below a given threshold. This implies that 
the first step is to obtain a smooth density field from 
the discrete distribution of matter. The morphological 
descriptors can be applied both to the observed galaxy 
distribution and to the dark-matter based A-body simu- 
lations of the large-scale structure. In all cases, we have 
to smooth the data to construct a real density field. This 
smoothing has to be more severe, when we want to mea- 
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sure the morphology of a discrete point distribution as a 
redshift catalog, than when we measure the morphology 
of the dark matter distribution in cosmological simula- 
tions. 

The first assumption we have to make is that the 
galaxy d istribution is a sample from a Cox process 
(see, e.g-. lMartinez fc Saarl (|2002fl ^ : the galaxy positions 
(xi, yi, Zi), i — l,...,n represent a point process which 
samples the continuous field. Smoothing has to recon- 
struct the underlying f(x,y,z), and if the smoothing is 

done either well or poorly, then the estimated field / will 
be either good or bad at representing the true underlying 
field/. 

It is well known that when we are estimating a density 
field /, there is a critical smoothing level at which it 
begins to be true that the estimated field resembles the 
true field. For example, with a C 2 density in dimension 
1, we need to smooth with the bandwidth 

h n ~ an' 1 ' 5 , (1) 

where a depe nds on the underlying density field / 
l)Donohol ll988). If we smooth less than this, h < h n , 
then the number of modes of the estimate / will tend to 
infinity, while if we smooth more than this, t he estimate 
will h ave fewer modes than the true density [(Silverman! 
1981). 

Cosmological density fields, instead of being a generic 
C 2 field, are more complex in nature. They have a gen- 
uinely multiscale structure, with near-singular behavior 
in sheets, filaments and clusters. The smoothing proce- 
dures that are proper for such objects are presumably 
entirely different than the smoothing that is good for C 2 
objects, and so we do not expect that Eq. ^can be ap- 
plied in this setting. 

Correct smoothing should also be spatially adaptive, 
so that locally it is using a scale based on the degree of 
smoothness of the object, or the scale should be smaller 
than the statistically significant structures. The method 
advocated in this paper, based on wavelet thresholding, 
does this automatically and provides a smoothing recipe 
that is unique for a given realization of a point process 
and does not depend on an a priori chosen bandwidth. 

Having obtained a consistent estimate of the density 
field, we can be certain that the morphology of the re- 
constructed field reflects the true underlying morphology 
of the point field. The goal of the present paper is finding 
and analyzing such morphological descriptors. 

2. SMOOTHING SCHEMES 

In this section we will introduce two different smooth- 
ing techniques that can be applied to obtain a continuous 
density field from a discrete point distribution. Our goal 
is to analyze how these schemes affect the correct deter- 
mination of the Minkowski functionals, and which is the 
best to study the morphology of the matter distribution. 

2.1. Gaussian smoothing 

For morphological studies, smoothing is typically done 
by using a Gaussian kernel 

^ (x) =(2^ eXP (^)- (2) 

The window width a is the parameter that governs 
the level of smoothing of the discrete data to obtain the 



kernel density estimate. lHamilton et al. I l)1986(l recom- 
mend that the smoothing length has to be chosen larger 
than the correlation length, ro, the distance at which 
the two-point correlation function £(ro) = 1. This is the 
recipe that is usually used for the morphological anal- 
ysis of the observed galaxy distribution, together with 
a requirement that the smoothing length should also be 
larger than the typical size of the volume-per-galaxy 1 
d = (V/n) 1 / 3 , where V is the total volume of the sam- 
ple and n t he nu mber of points (galaxies) (see, e.g., 
IHovle et al.l lEoolV 

A lot of work has been done by the statistical commu- 
nity on the optimal smoothing leng th that would giv e the 
best density estimate. However, as lSilvermanl ( 1981]) has 
pointed out: "Most methods seem to depend on some 
arbitrary choice of the scale of the effects being studied" . 
Certainly to choose the appropriate value of a is an art, 
but in any case we must avoid two kinds of artifacts: 
undersmoothing, which causes huge numbers of spuri- 
ous oscillations and oversmoothing, which removes real 
features of structure. This last aspect is crucial when 
measuring the morphology of the large scale structure 
because, since smoothing has to be large enough to de- 
scribe morphology reliably, it w ill inevitably erase small- 
scale non-Gaussian features. iColes fc Lucchinl (|1995f) 
note that "smoothing on scales much larger than the 
scale at which correlations are significant will tend to 
produce a Gauss ian distribution by vi rtue of the central 
limit theorem" ijMartmez et alJ[l993|) . A conservative 
approach is based on searching for efficient and consis- 
tent estimates of the bandwidth that are typically upper 
bounds. These scales would rev eal as much de tail as the 
optimal bandwidth, if it exists l)Donohoi ri988'). 

2.2. Wavelet denosing 

The Undecimated Isotropic Wavelet Transform 
(UIWT), also named a trous algorithm, decomposes an 
n x n x n data set D as a superposition of the form 

J 

D = cj + ^2wj, 

i=i 

where cj is a coarse or smooth version of the original 
data D and Wj represen t s the details of D a t scale 2 - - 7 
(see iStarck et all l|199Sj) : iStarck fc Murtaehl l)2002|) for 
details). Thus, the algorithm outputs J + 1 sub-band 
arrays of size n x n x n. We will use an indexing conven- 
tion such that j — 1 corresponds to the finest scale (high 
frequencies). Wavelets have been used successfully for 
deno ising via non-linear filter ing or thresholding meth- 
ods ijStarck fc Murtaehl 12002]) . Hard thresholding, for 
instance, consists of setting all insignificant coefficients 
(i.e. coefficients with an absolute value below a given 
threshold) to zero. 

For the noise model, given that this relates 
to point pattern clustering, we have to consider 
the Poisson noise case. The autq convolution his- 
togram me thod llSlezak et al.l 119931) used f o r X- 
ray image llStarck fc Pierrd 119981: IPierre et ail 120041 
iValtchanov et al.l2004j) can also be used here. It consists 
of calculating numerically the probability distribution 
function (pdf) of a wavelet uij^ x ^ VtZ coefficient with the 

1 d is typically referred to as the mean interparticle separation. 
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hypothesis that the galaxies used for obtaining Wj, x ,y,z 
are randomly distributed. The pdf is obtained by auto- 
convolving n times the histogram of the wavelet function, 
n being the number of galaxies which have been used for 
obtaining Wj }XtVtZ , i.e. the number of galaxies in a box 
around (x, y, x), the size of the box d epending on the 
scale j . More details can be found in iStarck fc Pierre! 
itToM r lStorek fc Murtaghl ll200l . 

Once the pdf relative to the coefficient Wj^ x ^ y , z is 
known, we can detect the significant wavelet coefficients 
easily. We derive two threshold values T™ n y and T™^ z 
such that 

Prob(W < T™ n y z )=e 
Prob(W>T™^) = e (3) 

e corresponding to the confidence level, and the positive 
(respective negative) wavelet coefficent is significant if it 
is larger than T™"* (resp. lower than T- n " y z ). Denot- 
ing D the noisy data and 5 the thresholding operator, 
the filtered data D are obtained by : 

D = K5(TD) (4) 

where T is the wavelet transform operator and 1Z is the 
wavelet reconstruction operator. In practice, we get bet- 
ter results using th e itera tive reconstruction described in 
IStarck fc Murtaghl l)2002j) which minimizes the U norm 
of the wavelet coefficients. It is this iterative technique 
that we have used for our experiments. 

Poisson noise denoising has been addressed in a 
series of recent papers l lFrvzlewicz fc Nasonl 12004 
Kolaczvkl 1200ft Il999t iNowak fc Baraniukl 11993: 
Antoniadis & Sapatina a 1200 1| iTimmermann fc Nowakl 
19991 Uammal fc Biiaouil l2004|) . All of them uses 
the Haar wavelet transform because it presents the 
interesting property that a Haar wavelet coefficient is 
the difference between two variables which follow a 
Poisson distribution. This property allows us to derive 
an analytical form of the pdf of the wavelet coefficients. 
The Haar transform has however several drawbacks 
such block artifact creation or a tendency to create 
square structures. For XMM, it was also shown than the 
isotropic wavelet transform w as much more pow erful for 
detecting clusters of galaxies l|Valtchanov et al.ll2001^ . 

3. MORPHOLOGICAL DESCRIPTORS 

3.1. The genus curve 

Historically, the first morpho logical descriptor used 
was the genus lIGott et al.lll98l . The genus G(S) mea- 
sures the connectivity of a surface, S, with holes and 
disconnected pieces, by the difference of the number of 
holes and the number of isolated regions: 

G(S) = number of holes— number of isolated regions+1. 

The genus of a sphere is G = 0, a torus or a sphere 
with a handle have the genus G = +1, a sphere with N 
handles has the genus G = +N, while the collection of 
N disjoint spheres has the genus G — —(N — 1). The 
genus describes the topology of the isodensity surfaces, 
thus its study is in the cosmological literature frequently 
called "topological analysis" . 

The genus curve is usually parameterized by two re- 
lated quantities, the filling factor, /, which is the frac- 
tion of the survey volume above the density threshold or, 



alternatively, by the quantity v defined by 

/ = 4=/ z~ t2/2 dt. (5) 

V 27T Jv 

In the case of a Gaussian random field, v is also the 
number of standard deviations by which the thresh- 
old density departs from the mean density, and with 
this parametrization, the genus per unit volume of a 
surface, S, corresponding to a given density threshold, 
g = (G(S) — l)/V, follows the analytical expression 

g(v)=N(l-v 2 )exp(- 1 ^\ , (6) 

where the amplitude A depends on the pow er spectrum 
of the random field ijHamilton et al. Ill986j) . If the den- 
sity distribution is not Gaussian, the parameterization 
(0 eliminates the (trivial) non-Gaussianity caused by 
the one-point density distribution. Opinions differ about 
which argument is better; we shall use v in this paper. 

This curve, symmetric about in v 1 is typical of the 
random-phase morphology. We have simulated a Gaus- 
sian random field with a power-law spectrum P(k) ~ fc" 1 
and this field has been smoothed with a Gaussian ker- 
nel with cr = 3 (the cube size is 128). As we see in 
Fig. 2] (left panels), the regions with density above or 
below the mean value are statistically indistinguishable. 
In the right column of this figure we show the isodensity 
surfaces for our realization, which encompass the denser 
regions of the simulated box. The three panels, from top 
to bottom, correspond respectively to 7%, 50%, and 93% 
of the volume encompassing regions with higher density. 
Likewise, the left column shows the low-density regions 
corresponding to the same percentage of the volume. The 
symmetry between the high-density and the low-density 
regions is clearly seen. The right panels of Fig. ^ depict 
the same realization, but more heavily smoothed, with 
the smoothing length a — 8. These are the standard 
distributions, which are typically compared with obser- 
vational data. Such a morphology is usually called "the 
sponge morphology" . The sponginess of the isodensity 
surfaces is clearly seen, particularly at the central pair 
of panels, in both figures, corresponding to the 50% low 
and high densities: the surface separating both regions 
has many holes, is multiply connected, and has negative 
curvature. 

Other types of genus curves can be found in the cos- 
mological literature. When rich clusters dominate the 
distribution, the genus curves are shifted to the left, and 
the morphology is referred to as "meat-ball" , while the 
expression "Swiss-cheese" is used for right-shifted genus 
curves corresponding to distributions with empty bub- 
bles surrounded by a single high density region. 

3.2. Minkowski junctionals 

An elegant generalization of the genus statistic is to 
consider this measure as one of the four Minkowski func- 
tionals which describe d ifferent morpholog ical aspects of 
the galaxy distribution l|Mecke et al.lll994IL These func- 
tionals provide a complete family of morphological mea- 
sures - all additive, motion invariant and conditionally 
continuous functionals defined for any hypersurface are 
linear combinations of its Minkowski functionals. 

The Minkowski functionals (MF fo r short) describe 
the morphology of isodensity surfaces l)Minkowskilll903|: 
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Fig. 1. — The two columns on the left show the spatial distribution of the low- (first column) and high density (second column) regions 
for a realization of a Gaussian random field, with comparatively little smoothing (a = 3). The upper pair shows the 7% (volume fraction) 
low, 93% high density regions, the middle pair stands for 50%— 50%, and the lower pair shows the 93% low-density, 7% high-density case. 
The two columns on the right are the same, but for heavy smoothing (a = 8). 



iTomital Il9 90). and depend thus on two factors - the 
smoothing proc edure and the specific density level, (see 
iSheth fc Salmi l|2005|) for a recent review). An alterna- 
tive approach starts from the point field, decorating the 
points with spheres of the same radius, and studying the 
morphology of the resultin g surface ijSchmalzing et all 
Il996t iKerscher et al.l 119971) . These functional depend 
only on one parameter (the radius of the spheres), but 
this approach does not refer to a density; we shall not 
use that for the present study. 

The Minkowski functionals are defined as follows. Con- 
sider an excursion set of a field ^>(x) in 3-D (the set 
of all points where (/>(x > <fi). Then, the first Minkowski 
functional (the volume functional) is the volume of the 
excursion set: 

TO) = f d 3 x. 

The second MF is proportional to the surface area of the 
boundary SF^ of the excursion set: 

TO) = \l dS(x). 

The third MF is proportional to the integrated mean 



curvature of the boundary: 

TO) = [ + -sWi dS(x), 

where R\ and R2 are the principal curvatures of the 
boundary. The fourth Minkowski functional is propor- 
tional to the integrated Gaussian curvature (the Euler 
characteristic) of the boundary: 

TO) = I p ,L , s dS{*). 
4tt J sf<> i?i(x)i? 2 (x) 

The last MF is simply related to the morphological genus 
g introduced in the previous subsection by 

V 3 =X=\(l-G) 

(x is the usual notation for the Euler characteristic) . The 
functional V3 is a bit more comfortable to use - it is ad- 
ditive, while G is not, and it gives just twice the number 
of isolated balls (or holes). Although the genus continues 
to be widely used, in several recent papers many authors 
have chosen to present the Minkowski functional V3; we 
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shall follow this recent and logical trend. Instead of the 
functionals, their spatial densities Vi are frequently used: 

Vi(f) = Vi(f)/V, * = 0,...,3, 

where V is the total sample volume. 

All the Minkowski functionals have analytic expres- 
sions for isodensity slices of realizations of Gaussian ran- 
dom fields. For three-dimensional space they are: 

2 A f v\ 
Wl = 3 7^ eXP l-2J< 
2 A 2 / v\ 

V3 ^ ( " 2 - 1)exp K)' 

where $(•) is the Gaussian error integral, and A is deter- 
mined by the correlation function £(?-) of the field as: 

>2 = i T(Q) 

27T ^(0) ■ 

3.3. Numerical algorithms 

Several algorithms are used to calculate the Minkowski 
functionals for a given density field and a given density 
threshold. We can either try to follow exactly the geom- 
etry of the isoden sity surface, e.g., using triangulation 
ijSheth et al.l l2003'). or to approximate the excursion set 
on a simple cubic lattice . The algorithm that was pro- 
posed first bv lGott et all (Jl986), uses a decomposition of 
the field in to filled and empty cells, and another popular 
algorithm ijColes et al.lll996f) uses a grid- valued density 
distribution. The lattice-based algorithms are simpler 
and faster, but not as accurate as the triangulation codes. 
The main difference is in the edge effects - while surface 
triangulation algorithms do not suffer from these, edge 
effects may be rather serious for the lattice algorithms. 

We use a simple grid-based algorithm, based on in- 
tegral geometry (the C r ofton' s intersection formula, see 
iSchmalzing fc Buchertl l)1997|) h We find the density 
thresholds for given filling fractions by sorting the grid 
densities, first. Vertices with higher densities than the 
threshold form the excursion set. This set is character- 
ized by its basic sets of different dimensions - points (ver- 
tices), edges formed by two neighboring points, squares 
(faces) formed by four edges, and cubes formed by six 
faces. The algorithm counts the numbers of all basic 
sets, and finds the values of the Minkowski functionals 
as 

V Q (f) = a 3 N 3 , 

^(/) = a 2 QiV 2 (/) - |JV 3 (/)) , 

V 2 (f) = a(^N 1 (f)-^N 2 (f) + ^N 3 (f)^ , 

V 3 (f) = N (f) - Nt(f) + N 2 (f) - N 3 (f), 

where a is the grid step, / is the filling factor, N is the 
number of vertices, Ni is the number of edges, N 2 is the 
number of squares (faces), and N 3 is the number of basic 
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Fig. 2. — The average genus curve for 50 realizations of a Gaus- 
sian random field with P(k) ~ k~ 1 together with the expected 
analytical result (solid line). The error bars show 1 a deviations. 

cubes in the excursion set for a given fillin g factor (den- 
sity threshold) . This formula was proven bv lAdlerl 1 1981) 
and w as first used in cosmological studies bv lColes et alJ 
(1996); we refer to that paper for a thorough discussion 
of the method and of necessary boundary corrections. 

This algorithm is simple to program, and it gives ex- 
cellent results, provided the grid step is substantially 
smaller than the characteristic lengths of the isosurfaces 
(the smoothing length). This is needed to be able to 
accurately follow the geometry of the surface. It is also 
very fast, allowing the use of Monte-Carlo simulations 
for error estimation. 

In order to test the algorithm and our program, we cal- 
culated the genus curve for 50 realizations of a Gaussian 
random field with a power-law power spectrum P(k) ~ 
fc _1 in a I28 3 box. The realizations were smoothed with 
a Gaussian kernel of a = 3. The results are shown in 
Fig. Our results are very close to the theoretical ex- 
pectations, and the errors are similar to those reported 
recently by iSheth et alJ l)2003j) , who used a very precise 
algorithm based on triangulated surfaces (SURFGEN). 

4. MINKOWSKI FUNCTIONALS OF SIMULATED POINT 
DISTRIBUTIONS 

In this section, we apply Gaussian smmothing and 
wavelet denoising procedures to three different point sets. 
For the Gaussian kernel, we choose different values of 
the bandwidth a. The fourth Minkowski functional (the 
Eulcr-Poincare characteristic V3) is then calculated for 
the smoothed density fields. We will see how the Gaus- 
sian smoothing tends to bring the l/3-curve closer to 
the expected one for a Gaussian random field, inde- 
pendently of the characteristics of the initial field. It 
demonstrates that the morphological characteristics, ob- 
tained by Gaussian smoothing, may carry more informa- 
tion about the filter itself than about the point process. 
We have chosen different point processes with genuinely 
non-Gaussian features, and with different topologies. 

4.1. Description of the samples 

The first data set used in this analysis has been gener- 
ated by A. Klypin from an A^-body simulation, and has 
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TABLE 1 
Simulated point distributions. 





N 


L 


ro 


d 


rf, 


d! 


(nnd') 


nbody 


14616 


60 


4.0 


2.45 


8.5 


5.2 


4.5 


filaments 


14718 


100 


10.0 


4.1 


12.8 


5.2 


2.6 


cheese 


14718 


128 


27.8 


5.2 


27.8 


5.2 


1.1 



Note: The first three lengths (L, ro and d) are in units of h 
Mpc, the last three lengths (r' ,d' and (nnd')), in grid units. 



been used in wavelet applications before (see, e.g. Starck 
& Murtagh 2002, p. 221). Th is simulation is described 
bv lKlvpin fc Holtzmanl l)1997lK It contains 14616 galax- 
ies within a cube of size of 60/i -1 Mpc. All the three 
samples have similar number of data points, and we cal- 
culate the Minkowski functionals for all three samples, 
using a 128 3 mesh. The correlation length ro and the 
size of the volume per particle d for this sample, both in 
physical and grid units, are given in Tabled We also 
give the mean nearest-neighbor distance for the sample 
((nnd)). If a sample is not too heavily clustered, this 
should be close to d. 

The second point process is based on Voronoi tessel- 
lation. We generate a Voronoi tessellation similar to 
the observed large-scale galaxy distribution, with the 
mean size of cells of AOh^ 1 Mpc in a 100ft. -1 Mpc cube, 
and populate the edges of the cells (filaments). There 
are about 26 Voronoi cells; the sample, contains 14718 
points, all close to filaments, with a r~ 2 cross-section 
density profile, and a 3/i _1 Mpc density scale. About 
70-75% of the space is empty. Table Ogives the charac- 
teristic lengths for this sample. 

We will call the third data set the "Swiss cheese" 
model. In a 128 3 cube, we cut out 40 holes with radii R 
in [20,40], with a uniform distribution of hole volumes. 
About 80-83% of the sample volume is empty, the re- 
maining volume is filled with a Poisson distribution of 
about 15000 points (see Table Pi . 

These simulated galaxy distributions are shown in 
Fig. El 

4.2. Smoothing and morphology 

In order to find the morphological descriptors 
(Minkowski functionals) for our realizations of point pro- 
cesses we have to smooth the data to obtain a continuous 
density field. The usual approach is to use Gaussian ker- 
nels for smoothing; we shall compare the results with 
those obtained by the wavelet-based smoothing scheme 
introduced in this paper. We calculated all MF-s, but 
as the functional VJs shows more details than others, we 
show only the results for this functional here. Fig. 21 
shows, in the panels of the right column, the three point 
patterns of Fig. |3] filtered by the 3D wavelet transform, 
using the algorithm described previously. The left and 
the middle panels of each row correspond to Gaussian 
smoothing with a = 1 and a = 3 (in grid units), respec- 
tively. 

We can clearly see that when the bandwidth is too 
small (left panels), discreteness and noise dominate the 
reconstructed density fields, while using a larger value of 
a tends to erase all the small scale features of the dis- 
tribution. This is also shown in Fig. where we can 
see that the morphology of the Gaussian-smoothed den- 




FlG. 3. — The three data sets that will serve to illustrate the dif- 
ferent smoothing schemes and their implications when estimating 
the Euler characteristic. The top panel shows the iV-body data, the 
middle panel shows the Voronoi filament model, and the bottom 
panel — the nearly-empty Swiss cheese model. 
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Fig. 4. — Rendering of the density fields, obtained by smoothing of the three data sets shown in Fig. EJwith a Gaussian filter with a = 1 
(first column), a = 3 (second column) by and wavelet dcnoising (third column). The smoothing lengths are given in grid units. 



sity field, as described by the Euler characteristic V3, de- 
pends strongly on the width of the Gaussian filter. This 
width is a free parameter and thus the Gaussian-filtered 
density field is not uniquely determined. Choosing the 
width of the filter we discard information on scales of 
that width and smaller. On the other hand, the wavelet 
transform leads to a sparse representation of the density 
field and allows us to detect and keep at all scales co- 
efficients which have the greatest probability to be real. 
This is demonstrated by the 3D image in the right pan- 
els of Fig. 21 where we see, e.g., in the rendering of the 
iV-body model (top-right) how large filaments, big clus- 



ters and walls coexist with small scale features such as 
the density enhancements around groups and small clus- 
ters. The Euler characteristic of this adaptive recon- 
structed density field is much more informative, because 
it is unique, it does not depend on the particular choice 
of the filter radius. Because of that, wavelet morphology 
is clearly a more useful tool than the usual approach of 
Gaussian smoothing. Also, the Minkowski functionals of 
Gaussian-smoothed density fields mimic those of Gaus- 
sian random fields, in contrast with the wavelet-based 
approach. Thus, they describe more the properties of 
the filter, than the real morphology of the density distri- 
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bution. 

This is seen already in the case of the TV-body model 
(the top panel of Fig. [SJ, where the V3 curve is close 
to Gaussian already for a = 3 (in grid units), much 
smaller than ro , and even smaller than the mean nearest- 
neighbor distance. 

For the clearly non-Gaussian Voronoi filament model, 
when we increase the value of a, the V3 curves also ap- 
proach the typical shape for a Gaussian field (see the 
middle panel of Fig.[5J), while the wavelet-denoised den- 
sity shows the expected behavior for the Euler charac- 
teristic for this kind of spatial configuration. The three 
curves shown in the middle panel of Fig. correspond 
to the iso-density contours shown in Fig. While for 
Gaussian smoothing with a = 3 it is still possible to see 
the filamentary structure in the V3 diagram, for a = 8 
the isocountours are indistinguishable of those of a Gaus- 
sian field like the one shown in Fig. ^ It is clear that 
such a smoothing is excessive, and destroys the original 
morphology of the point sample. The V3 curve is, in 
fact, close to Gaussian for a = 6 already. Both a = 6 
and (7 = 8 are smaller than the correlation length of this 
sample (see Tabled, and a = 6 is close to the size of the 
volume-per-particle d. 

The nearly-empty Swiss cheese model is even more 
non-Gaussian, and therefore, even for large values of a, 
Gaussian smoothing does not converge to the symmet- 
ric V3 curve. Nevertheless, the shape of the curve for 
the Gaussian-smoothed density depends strongly on the 
bandwidth, and again the curve for the wavelet-denoised 
density is clearly more representative of the true under- 
lying morphology. 

5. MORPHOLOGY OF THE 2DFGRS 

5.1. Data 

The best available redshift catalog to study morphol- 
ogy of the galaxy distribution at present is the 2dF 
Galaxy Redshift Survey (2dFGRS) l|Colless et 3,1.1 120031. 
It fills large compact volume(s) in space and includes 
more than a quarter of million of galaxies. This is a 
flux-limited catalog and therefore the density of galaxies 
decreases with distance. For statistical analysis of such 
of surveys, a weighting scheme that compensates for the 
missing galaxies at large distances, has to be used. Usu- 
ally, each galaxy is weighted by the in verse of the se- 
lection function ijMartmez fc Saarll2002|) . However, the 
resulting densities will have different resolution at differ- 
ent locations, and will not be suitable for morphological 
studies. 

At the cost of discarding many surveyed galaxies, one 
can alternatively use volume-limited samples. In this 
case, the variation in density at different locations de- 
pends only on the fluctuations of the galaxy distribution 
itself. We have used the volume-limited samples pre- 
pared by the 2dF team for scaling studies l)Groton et al.l 
l2004albf) , and kindly sent to us by Darren Croton. As our 
basic sample, we chose the catalog with absolute lumi- 
nosities in the range — 19 > Mbj — 51og ln fi. > —20 (the 
type dependent k + e correction l|Norberg et al. 1 120021) 
has been applied to the magnitudes). This sample con- 
tains galaxies with luminosity around L*. This catalog 
is the largest of the 2dF volume-limited catalogs, and as 
iBaugh et al.l l)2004|) point out, it provides optimal bal- 
ance between the surveyed volume and the number den- 
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Fig. 5. — The V3 curves for the three point distributions. We 
show in each panel the curves obtained by smoothing the data with 
a Gaussian with two different filter widths (in grid units) and the 
MF V3 for the wavelet filtered data set. As previously, the top 
panel corresponds to the iV-Body simulation, the middle panel is 
for the Voronoi filament model, and the bottom panel corresponds 
to the nearly-empty Swiss cheese model. 



sity of galaxies. Although the catalog does not suffer 
from luminosity incompleteness, it is slightly spectro- 
scopically incomplete, mainly due to missing galaxies be- 
cause of fiber collisions. The incompleteness parameter 
has been determined by every galaxy by the 2dF team; 
when calculating densities, each galaxy can be weighted 
by the inverse of this parameter. 
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Fig. 6. — The isodensity surfaces corresponding to the Voronoi filament model for the Gaussian-smoothed field with cr = 3 (upper row), 
cr = 8 (middle row, all in grid units) and the wavelet-denoised field (bottom row). The density thresholds delineate, from left to right, 7% 
low, 50% low, 50% high and 7% high density regions. 



We split the volume-limited sample into the North- 
ern and Southern subsamples, and cut off the numerous 
whiskers in the plane of the sky to obtain compact vol- 
umes. 

We performed morphological analysis for both the 
Southern and Northern subsamples. The grid-based 
scheme we use works well for simple cuboid geometries. 
The geometry of the Northern sample is similar to a flat 
slice, while the Southern sample is enclosed between two 
cones of opening angles of 64.5° and 55.5°. When we 
tried to cut cuboidal volumes (bricks) from the Southern 
sample cone, we ended up with small brick volumes. So 
wc carried out the morphological analysis for the full vol- 
ume of the Southern sample, only to find that the border 
corrections for the Minkowski functionals are large and 
uncertain. Thus we report in this paper only the results 
of the analysis for the Northern sample. 

In order to obtain a compact volume, we choose the 
angular limits for the Northern sample as —4.5° < 5 < 
2.5° and 149.0° < a < 209.0°. The slice lies between 
two cones defined by the S limits. The right ascension 
limits cut the cones by planes from both sides, and there 



are two additional cuts by two spheres. The radii of 
the spheres are fixed by the original data, and depend 
only on the chosen absolute magnitude limits (and on the 
cosmological model). For our sample they are: R\ =61.1 
hr 1 Mpc, i? 2 = 375.6 IT 1 Mpc. 

As this sample is pretty flat, we cut from it a maximal 
volume cuboidal window, a "brick" with dimensions of 
254.0 x 133.1 x 31.1 h,- 1 Mpc, with 8487 galaxies (see 
Fig. 0). This gives for the per-particle- volume size d = 
5.0 /i" 1 Mpc. 

5.2. Mock catalogs 

In order to estimate sample errors of the Minkowski 
functio nals, we use m o ck cat alogs, provided by the 2dF 
team. iNorberg et alJ l)2002|) created 22 mock catalogs 
for the 2dFGRS that have been used by the 2dfGRS 
team to measure the influence of cosmic variance of dif- 
ferent statistics, as correlation functions, counts-in-cells, 
the void probability function, clustering of groups, etc. 
(ICroton et al.ll2004albl: IBaugh et"aTll2004l IPadilla et alJ 
2004) . The mock catalogs were extracted from the Virgo 
Consortium ACDM Hubble volume simulation, and a bi- 
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Fig. 7. — The volume-limited cuboidal sample analyzed in this 
paper drawn from the Northern slice of the 2dFGRS (top) and 
from a mock realization. 



asing scheme described in lCole et alJ l|1998D was used to 
populate the dark matter distribution with galaxies. The 
catalogs were created by placing observers in the Hubble 
volume, applying the radial and angular selection func- 
tions of the 2dFGRS, and translating the positions and 
velocities of galaxies into redshift space. No luminosity 
clustering dependence is present in the mock catalogs. 

The mock catalogs represent typical volumes of space. 
The real 2dF catalog, however, includes two superclus- 
ters, one in the Northern, another in the Southern 
subsam ple (see a thorough discussion in iCroton et alJ 
(2004b)). The Northern supercluster is especially promi- 
nent in the M G [—19,-20] survey; all mock samples 
for this catalog have less galaxies than the 2dF sample. 
We cut mock bricks from the mock samples, too, as we 
did for the real 2dF data; the mean number of galaxies 
in the mock bricks is 1.36 times smaller than in the 2dF 
brick. The supercluster shows up in the correlation func- 
tion, too, enhancing correlations at intermediate scales, 
compared to those of the mocks (Fig. [SJ. The corre- 
lation length for the brick is tq — 6.8 /i" 1 Mpc, only 
slightly larger than the characteristic length d = 5.0 
Mpc. We remind the reader that this is the correlation 
length for redshift space; the 2dF correlation length for 
real space has been estimated as Rq = 5.05 h^ 1 Mpc 
ijHawkins et al.ll2003|) . The mean nearest-neighbor dis- 
tance is 2.3 Mpc, showing that the galaxy distribu- 
tion is well clustered. 

5.3. Minkowski functionals of the 2dFGRS Northern 

sample 



2dF190 (brick) 
Mock catalogs I 




r (h" 1 Mpc) 

Fig. 8. — The two-point correlation function of the 2dF brick 
(open circles) together with the average and the total deviation 
range for the 22 mock catalogs. 



As we said, we show the results for only one volume- 
limited subsample of the 2dFGRS Northern area. Other 
subsamplcs have cither smaller volumes or smaller galaxy 
densities. 

We do not use the weights to correct for spectroscopic 
incompleteness for the final results. We have seen that 
the influence of the weights in the correlation function 
£(r) is neglig i ble. A similar test has been performed by 
ICroton et all l|2004a|) using counts-in-cells statistics on 
mock catalogs, both complete and incomplete, reaching 
a similar conclusion. We have tested the influence of 
the incompleteness by calculating density fields for sev- 
eral Gaussian smoothing lengths with and without the 
weights, and compared the resulting Minkowski function- 
als. The differences were almost imperceptible, thus we 
decided for the conceptually simpler procedure. 

We calculate the Minkowski functionals by sweeping 
over the grid (we use a l/i -1 Mpc grid step). We start at 
the nearby border planes, and we account for the edge 
effects for bricks by not using the grid vertices at the 
faraway borders. We tested this procedure by using re- 
alizations of Gaussian random fields; although the bor- 
der effects are small, the correction works well. We esti- 
mate the significance of the deviations of the MF curves 
from those for a Gaussian random field, by calculating 
them for a large number of Gaussian realizations (about 
1300). In order to create these realizations, we adopted 
the analytical approximati on for the power spectrum by 
iKlvoin & Holtzmanl l)1997t) . for parameters similar to the 
concordance model (fi roa tter = 0.3, fl\ — 0.7, f^bar = 
0.026, h= 0.7). 

In order to estimate the cosmic variance, we use the 
22 mock bricks described above. As the distribution of 
MF amplitudes is rather asymmetric, we do not find the 
variance, but we show the total range of variation of the 
mock MF curves. As there are 22 mock samples, this 
range is close to the usual Gaussian "2 sigma" confi- 
dence regions. Thus, the confidence regions for Gaussian 
realizations given in the figures, are also given for the 2a 
(95%) level. 

We noticed above that the mock catalogs miss the su- 
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Fig. 9.— The Minkowski functional Vi for the 2dF GRS North- 
ern brick, for Gaussian smoothing with a = 4A Mpc (solid line). 
The cosmic error is characterized by the variability of V\ for 22 
mock samples (shown by bars, the same smoothing). The 95% 
confidence regions for the theoretical prediction, cr = 4/i _1 Mpc- 
smoothed realizations of Gaussian random fields with the 'concor- 
dance cosmology' power spectrum, are shown by dashed lines. 



Fig. 10.— The Minkowski functional V 2 for the 2dF GRS North- 
ern brick, for Gaussian smoothing with a = 4ft" 1 Mpc (solid line). 
The cosmic error is characterized by the variability of V2 for 22 
mock samples (shown by bars, the same smoothing). The 95% 
confidence regions for the theoretical prediction, a = 4h~ 1 Mpc- 
smoothed realizations of Gaussian random fields with the 'concor- 
dance cosmology' power spectrum, are shown by dashed lines. 



percluster present in the real 2dF sample (look at the 
front left region of the 2dF brick in the upper panel 
of Fig. [7J, and have systematically lower density than 
the rea l 2dF sample. The fix adopted bv ICroton et al.1 
(2004b) was to cut out the region surrounding the super- 
cluster. We cannot do that, as this would lead to complex 
boundary corrections. For wavelet cleaning this should 
not be a problem, the algorithm will automatically fol- 
low the density distribution. For Gaussian smoothing, 
we compensated for the density difference by using 1.11 
times wider smoothing lengths for mocks than for the 
2dF brick. The smoothing lengths for the Gaussian re- 
alization remain unsealed, of course. 

We start with the first two nontrivial Minkowski func- 
tionals (the first MF, Vo, is trivially Gaussian due to our 
choice of the argument v). The second (Fig. El MF (the 
area of the isodensity surfaces) for the Gaussian smooth- 
ing with a = 4 (grid units or hr 1 Mpc) barely fits into 
the 95% Gaussian confidence interval (it lies completely 
in the 3<t interval). It is interesting that the values of 
V\ for the mocks lie mostly outside of it - the isodensity 
surfaces are smoother than for the real data (recall the 
supercluster), and than for the Gaussian random field, 
too. 

The third (Fig. 1 1 Of) MF (the mean curvature of the 
isodensity surfaces) for the Gaussian smoothing with the 
same a — 4 as above also lies a bit outside of the 95% 
Gaussian confidence interval, but fits completely in the 
3cr interval, not shown in the figure. Mocks do not lie well 
within the 95% confident Gaussian band, while the V2 
curve for the 2dF data lies close to the extreme V% values 
of the mock catalogues shown by bars in the diagram. 
These two figures show that Gaussian smoothing with 
a = 4 (recall that r = 6.8 h" 1 Mpc for the 2dF brick) 
has already given a nearly Gaussian morphology to the 
data. 

As usual, the V3 curves (Fig.lll|l show the most details. 
The upper panel shows that the data smoothed with a 
Gaussian filter of width a = 2, is yet undersmoothed, 



but does not differ very much from a Gaussian random 
field. Discreteness effects are more evident for the mock 
samples (the peak around v — 0.7). The middle panel 
demonstrates again that the density field smoothed with 
a Gaussian filter of width a = 4 can already be consid- 
ered Gaussian, and the mocks do not differ much from 
Gaussian realizations either. 

These two panels show how the answer to the ques- 
tion of whether the density distribution has intrinsically 
Gaussian morphology, depends on the adopted smooth- 
ing widths. The bottom panel shows the result for 
wavelet filtering of the point distribution. This curve 
is clearly non-Gaussian, showing the presence of com- 
pact clusters for high-density isosurfaces, and a sponge- 
like morphology near v — 0. However, in contrast to the 
Gaussian case, the curve returns to for smaller values 
of v - about half of the sample space remains empty after 
wavelet denoising. Gaussian smoothing, on the contrary, 
tends to fill up the space. The wavelet-filtered mocks 
show, in principle, similar behavior to the data. They 
are only smoother, as seen from the differences around 
v = 2. It is interesting that the wavelet-filtered V3 curve 
is similar to those for the Voronoi filament sample - 
both samples are filamentary at larger scales. Wavelet 
morphology returns a clear picture of the density field, 
again, in contrast to the Gaussian-smoothed V3 for the 
2dF data, where filamentarity is difficult to see. 

6. CONCLUSIONS 

We have presented a new wavelet-based method to 
study the morphology of the galaxy distribution - 
wavelet morphology. As we have shown, it gives a unique 
morphological description, and is more accurate, cap- 
turing the details of the distribution that are destroyed 
by usual Gaussian smoothing. The code for the anal- 
ysis of wavelet morphology will be made available at 
http : // j starck . free . f r 

Using special highly non-Gaussian realizations of point 
processes, we have demonstrated that Gaussian smooth- 
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ing introduces Gaussian features in the morphology, and 
is thus not the best tool to search for departures from 
Gaussianity. 

We performed wavelet-morphological analysis of the 
most detailed 2dF GRS volume- limited sample and found 
that it is clearly non-Gaussian. The wavelet Minkowski 
functional V3 finds high-density clusters, large-scale fil- 
amcntarity, and huge empty voids. A similar morpho- 
logical analysis, based on Gaussian smoothing, leads to 
the conclusion that the morphology of the sample is close 
to Gaussian, already for comparatively small smoothing 
lengths (er > 4 ft, -1 Mpc). This is a clear example of 
Gaussian contamination. 

The isotropic wavelet transform is optimal only for the 
detection of isotropic features, but not for the detection 
of filaments or walls. A clear improvement could be 
made by using simultaneously several other multiscale 
transforms such the ridgelet transform and the beamlet 
transform which are respectiv ely well suited for walls and 
filaments (jStarck et alJ l2005). This will be done in the 
future. 

Wavelet morphology detects also the large superclus- 
ter in the 2dFGRS Northern sample, that has not been 
modeled by A-body mock catalogs. A signature of the 
presence of this supercluster could be deduced from the 
correlation function. Gaussian morphology does not de- 
tect this feature. 




Fig. 11. — The Minkowski functional V3 for the 2dF brick. The 
upper panels show the results for Gaussian smoothing with a = 
2h~ *Mpc and a = Ah~ 1 Mpc, respectively (the designations are the 
same as in the previous two figures). The bottom panel describes 
wavelet morphology of the 2dF GRS, showing the V3 curve for 
the wavelet denoised data set (thick solid line), and comparing it 
with the variability range of the wavelet denoised mocks (bars). 
We show also the 95% confidence limits for 1300 realizations of 
theoretical Gaussian density fields (dashed lines), and the V3 data 
curve (thin solid line), all obtained for the Gaussian a = 2h~ 1 Mpc 
smoothing. 
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